{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial 14: Multiagent\n",
    "\n",
    "This tutorial covers the implementation of multiagent experiments in Flow. It assumes some level of knowledge or experience in writing custom environments and running experiments with RLlib. The rest of the tutorial is organized as follows. Section 1 describes the procedure through which custom environments can be augmented to generate multiagent environments. Then, section 2 walks you through an example of running a multiagent environment\n",
    "in RLlib."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Creating a Multiagent Environment Class\n",
    "\n",
    "In this part we will be setting up steps to create a multiagent environment. We begin by importing the abstract multi-agent evironment class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the base Multi-agent environment \n",
    "from flow.envs.multiagent.base import MultiEnv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In multiagent experiments, the agent can either share a policy (\"shared policy\") or have different policies (\"non-shared policy\"). In the following subsections, we describe the two."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 Shared policies\n",
    "In the multi-agent environment with a shared policy, different agents will use the same policy. \n",
    "\n",
    "We define the environment class, and inherit properties from the Multi-agent version of base env."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "class SharedMultiAgentEnv(MultiEnv):\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This environment will provide the interface for running and modifying the multiagent experiment. Using this class, we are able to start the simulation (e.g. in SUMO), provide a network to specify a configuration and controllers, perform simulation steps, and reset the simulation to an initial configuration.\n",
    "\n",
    "For the multi-agent experiments, certain functions of the `MultiEnv` will be changed according to the agents. Some functions will be defined according to a *single* agent, while the other functions will be defined according to *all* agents.\n",
    "\n",
    "In the follwing functions, observation space and action space will be defined for a *single* agent (not all agents):\n",
    "\n",
    "* **observation_space**\n",
    "* **action_space**\n",
    "\n",
    "For instance, in a multiagent traffic light grid, if each agents is considered as a single intersection controlling the traffic lights of the intersection, the observation space can be define as *normalized* velocities and distance to a *single* intersection for nearby vehicles, that is defined for every intersection.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def observation_space(self):\n",
    "        \"\"\"State space that is partially observed.\n",
    "\n",
    "        Velocities and distance to intersections for nearby\n",
    "        vehicles ('num_observed') from each direction.\n",
    "        \"\"\"\n",
    "        tl_box = Box(\n",
    "            low=0.,\n",
    "            high=1,\n",
    "            shape=(2 * 4 * self.num_observed),\n",
    "            dtype=np.float32)\n",
    "        return tl_box"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The action space can be defined for a *single* intersection as follows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def action_space(self):\n",
    "        \"\"\"See class definition.\"\"\"\n",
    "        if self.discrete: \n",
    "            # each intersection is an agent, and the action is simply 0 or 1. \n",
    "            # - 0 means no-change in the traffic light \n",
    "            # - 1 means switch the direction\n",
    "            return Discrete(2)\n",
    "        else:\n",
    "            return Box(low=0, high=1, shape=(1,), dtype=np.float32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Conversely, the following functions (including their return values) will be defined to take into account *all* agents:\n",
    "\n",
    "* **apply_rl_actions**\n",
    "* **get_state**\n",
    "* **compute_reward**\n",
    "\n",
    "Instead of calculating actions, state, and reward for a single agent, in these functions, the ctions, state, and reward will be calculated for all the agents in the system. To do so, we create a dictionary with agent ids as keys and different parameters (actions, state, and reward ) as vaules. For example, in the following `_apply_rl_actions` function, based on the action of intersections (0 or 1), the state of the intersections' traffic lights will be changed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class SharedMultiAgentEnv(MultiEnv): \n",
    "    def _apply_rl_actions(self, rl_actions):\n",
    "        for agent_name in rl_actions:\n",
    "            action = rl_actions[agent_name]\n",
    "            # check if the action space is discrete\n",
    "            \n",
    "            # check if our timer has exceeded the yellow phase, meaning it\n",
    "            # should switch to red\n",
    "            if self.currently_yellow[tl_num] == 1:  # currently yellow\n",
    "                self.last_change[tl_num] += self.sim_step\n",
    "                if self.last_change[tl_num] >= self.min_switch_time: # check if our timer has exceeded the yellow phase, meaning it\n",
    "                # should switch to red\n",
    "                    if self.direction[tl_num] == 0:\n",
    "                        self.k.traffic_light.set_state(\n",
    "                            node_id='center{}'.format(tl_num),\n",
    "                            state=\"GrGr\")\n",
    "                    else:\n",
    "                        self.k.traffic_light.set_state(\n",
    "                            node_id='center{}'.format(tl_num),\n",
    "                            state='rGrG')\n",
    "                    self.currently_yellow[tl_num] = 0\n",
    "            else:\n",
    "                if action:\n",
    "                    if self.direction[tl_num] == 0:\n",
    "                        self.k.traffic_light.set_state(\n",
    "                            node_id='center{}'.format(tl_num),\n",
    "                            state='yryr')\n",
    "                    else:\n",
    "                        self.k.traffic_light.set_state(\n",
    "                            node_id='center{}'.format(tl_num),\n",
    "                            state='ryry')\n",
    "                    self.last_change[tl_num] = 0.0\n",
    "                    self.direction[tl_num] = not self.direction[tl_num]\n",
    "                    self.currently_yellow[tl_num] = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, the `get_state` and `compute_reward` methods support the dictionary structure and add the observation and reward, respectively, as a value for each correpsonding key, that is agent id. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class SharedMultiAgentEnv(MultiEnv): \n",
    "\n",
    "    def get_state(self):\n",
    "        \"\"\"Observations for each intersection\n",
    "\n",
    "        :return: dictionary which contains agent-wise observations as follows:\n",
    "        - For the self.num_observed number of vehicles closest and incomingsp\n",
    "        towards traffic light agent, gives the vehicle velocity and distance to\n",
    "        intersection.\n",
    "        \"\"\"\n",
    "        # Normalization factors\n",
    "        max_speed = max(\n",
    "            self.k.network.speed_limit(edge)\n",
    "            for edge in self.k.network.get_edge_list())\n",
    "        max_dist = max(grid_array[\"short_length\"], grid_array[\"long_length\"],\n",
    "                       grid_array[\"inner_length\"])\n",
    "\n",
    "        # Observed vehicle information\n",
    "        speeds = []\n",
    "        dist_to_intersec = []\n",
    "\n",
    "        for _, edges in self.network.node_mapping:\n",
    "            local_speeds = []\n",
    "            local_dists_to_intersec = []\n",
    "            # .... More code here (removed for simplicity of example)\n",
    "            # ....\n",
    "\n",
    "            speeds.append(local_speeds)\n",
    "            dist_to_intersec.append(local_dists_to_intersec)\n",
    "            \n",
    "        obs = {}\n",
    "        for agent_id in self.k.traffic_light.get_ids():\n",
    "            # .... More code here (removed for simplicity of example)\n",
    "            # ....\n",
    "            observation = np.array(np.concatenate(speeds, dist_to_intersec))\n",
    "            obs.update({agent_id: observation})\n",
    "        return obs\n",
    "\n",
    "\n",
    "    def compute_reward(self, rl_actions, **kwargs):\n",
    "        if rl_actions is None:\n",
    "            return {}\n",
    "\n",
    "        if self.env_params.evaluate:\n",
    "            rew = -rewards.min_delay_unscaled(self)\n",
    "        else:\n",
    "            rew = -rewards.min_delay_unscaled(self) \\\n",
    "                  + rewards.penalize_standstill(self, gain=0.2)\n",
    "\n",
    "        # each agent receives reward normalized by number of lights\n",
    "        rew /= self.num_traffic_lights\n",
    "\n",
    "        rews = {}\n",
    "        for rl_id in rl_actions.keys():\n",
    "            rews[rl_id] = rew\n",
    "        return rews"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 Non-shared policies\n",
    "\n",
    "In the multi-agent environment with a non-shared policy, different agents will use different policies. In what follows we will see the two agents in a ring road using two different policies, 'adversary' and 'av' (non-adversary).\n",
    "\n",
    "Similarly to the shared policies, observation space and action space will be defined for a *single* agent (not all agents):\n",
    "\n",
    "* **observation_space**\n",
    "* **action_space**\n",
    "\n",
    "And, the following functions (including their return values) will be defined to take into account *all* agents::\n",
    "\n",
    "* **apply_rl_actions**\n",
    "* **get_state**\n",
    "* **compute_reward**\n",
    "\n",
    "\\* Note that, when observation space and action space will be defined for a single agent, it means that all agents should have the same dimension (i.e. space) of observation and action, even when their policise are not the same. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us start with defining  `apply_rl_actions` function. In order to make it work for a non-shared policy multi-agent ring road, we define `rl_actions` as a combinations of each policy actions plus the `perturb_weight`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "class NonSharedMultiAgentEnv(MultiEnv):\n",
    "    def _apply_rl_actions(self, rl_actions):\n",
    "        # the names of all autonomous (RL) vehicles in the network\n",
    "        agent_ids = [\n",
    "            veh_id for veh_id in self.sorted_ids\n",
    "            if veh_id in self.k.vehicle.get_rl_ids()\n",
    "        ]\n",
    "        # define different actions for different multi-agents \n",
    "        av_action = rl_actions['av']\n",
    "        adv_action = rl_actions['adversary']\n",
    "        perturb_weight = self.env_params.additional_params['perturb_weight']\n",
    "        rl_action = av_action + perturb_weight * adv_action\n",
    "        \n",
    "        # use the base environment method to convert actions into accelerations for the rl vehicles\n",
    "        self.k.vehicle.apply_acceleration(agent_ids, rl_action)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the `get_state` method, we define the state for each of the agents. Remember, the sate of the agents can be different. For the purpose of this example and simplicity, we define the state of the adversary and non-adversary agent to be the same. \n",
    "\n",
    "In the `compute_reward` method, the agents receive opposing speed rewards. The reward of the adversary agent is more when the speed of the vehicles is small, while the non-adversary agent tries to increase the speeds of the vehicles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "class NonSharedMultiAgentEnv(MultiEnv):\n",
    "    def get_state(self, **kwargs):\n",
    "        state = np.array([[\n",
    "            self.k.vehicle.get_speed(veh_id) / self.k.network.max_speed(),\n",
    "            self.k.vehicle.get_x_by_id(veh_id) / self.k.network.length()\n",
    "        ] for veh_id in self.sorted_ids])\n",
    "        state = np.ndarray.flatten(state)\n",
    "        return {'av': state, 'adversary': state}\n",
    "\n",
    "    def compute_reward(self, rl_actions, **kwargs):\n",
    "        if self.env_params.evaluate:\n",
    "            reward = np.mean(self.k.vehicle.get_speed(\n",
    "                self.k.vehicle.get_ids()))\n",
    "            return {'av': reward, 'adversary': -reward}\n",
    "        else:\n",
    "            reward = rewards.desired_velocity(self, fail=kwargs['fail'])\n",
    "            return {'av': reward, 'adversary': -reward}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Running Multiagent Environment in RLlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When running the experiment that uses a multiagent environment, we specify certain parameters in the `flow_params` dictionary. \n",
    "\n",
    "Similar to any other experiments, the following snippets of codes will be inserted into a blank python file (e.g. `new_multiagent_experiment.py`, and should be saved under `flow/examples/exp_configs/rl/multiagent/` directory. (all the basic imports and initialization of variables are omitted in this example for brevity)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "from flow.envs.multiagent import MultiWaveAttenuationPOEnv\n",
    "from flow.networks import MultiRingNetwork\n",
    "from flow.core.params import SumoParams, EnvParams, NetParams, VehicleParams, InitialConfig\n",
    "from flow.controllers import ContinuousRouter, IDMController, RLController\n",
    "\n",
    "# time horizon of a single rollout\n",
    "HORIZON = 3000\n",
    "# Number of rings\n",
    "NUM_RINGS = 1\n",
    "\n",
    "vehicles = VehicleParams()\n",
    "for i in range(NUM_RINGS):\n",
    "    vehicles.add(\n",
    "        veh_id='human_{}'.format(i),\n",
    "        acceleration_controller=(IDMController, {\n",
    "            'noise': 0.2\n",
    "        }),\n",
    "        routing_controller=(ContinuousRouter, {}),\n",
    "        num_vehicles=21)\n",
    "    vehicles.add(\n",
    "        veh_id='rl_{}'.format(i),\n",
    "        acceleration_controller=(RLController, {}),\n",
    "        routing_controller=(ContinuousRouter, {}),\n",
    "        num_vehicles=1)\n",
    "\n",
    "flow_params = dict(\n",
    "    # name of the experiment\n",
    "    exp_tag='multiagent_ring_road',\n",
    "\n",
    "    # name of the flow environment the experiment is running on\n",
    "    env_name=MultiWaveAttenuationPOEnv,\n",
    "\n",
    "    # name of the network class the experiment is running on\n",
    "    network=MultiRingNetwork,\n",
    "\n",
    "    # simulator that is used by the experiment\n",
    "    simulator='traci',\n",
    "\n",
    "    # sumo-related parameters (see flow.core.params.SumoParams)\n",
    "    sim=SumoParams(\n",
    "        sim_step=0.1,\n",
    "        render=False,\n",
    "    ),\n",
    "\n",
    "    # environment related parameters (see flow.core.params.EnvParams)\n",
    "    env=EnvParams(\n",
    "        horizon=HORIZON,\n",
    "        warmup_steps=750,\n",
    "        additional_params={\n",
    "            'max_accel': 1,\n",
    "            'max_decel': 1,\n",
    "            'ring_length': [230, 230],\n",
    "            'target_velocity': 4\n",
    "        },\n",
    "    ),\n",
    "\n",
    "    # network-related parameters \n",
    "    net=NetParams(\n",
    "        additional_params={\n",
    "            'length': 230,\n",
    "            'lanes': 1,\n",
    "            'speed_limit': 30,\n",
    "            'resolution': 40,\n",
    "            'num_rings': NUM_RINGS\n",
    "        },\n",
    "    ),\n",
    "\n",
    "    # vehicles to be placed in the network at the start of a rollout\n",
    "    veh=vehicles,\n",
    "\n",
    "    # parameters specifying the positioning of vehicles upon initialization/\n",
    "    # reset\n",
    "    initial=InitialConfig(bunching=20.0, spacing='custom'),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we run the following code to create the environment "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "from flow.utils.registry import make_create_env\n",
    "from ray.tune.registry import register_env\n",
    "\n",
    "create_env, env_name = make_create_env(params=flow_params, version=0)\n",
    "\n",
    "# Register as rllib env\n",
    "register_env(env_name, create_env)\n",
    "\n",
    "test_env = create_env()\n",
    "obs_space = test_env.observation_space\n",
    "act_space = test_env.action_space"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 Shared policies\n",
    "\n",
    "When we run a shared-policy multiagent experiment, we refer to the same policy for each agent. In the example below the agents will use 'av' policy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ray.rllib.agents.ppo.ppo_policy import PPOTFPolicy\n",
    "\n",
    "def gen_policy():\n",
    "    \"\"\"Generate a policy in RLlib.\"\"\"\n",
    "    return PPOTFPolicy, obs_space, act_space, {}\n",
    "\n",
    "\n",
    "# Setup PG with an ensemble of `num_policies` different policy graphs\n",
    "POLICY_GRAPHS = {'av': gen_policy()}\n",
    "\n",
    "\n",
    "def policy_mapping_fn(_):\n",
    "    \"\"\"Map a policy in RLlib.\"\"\"\n",
    "    return 'av'\n",
    "\n",
    "\n",
    "POLICIES_TO_TRAIN = ['av']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Non-shared policies\n",
    "\n",
    "When we run the non-shared multiagent experiment, we refer to different policies for each agent. In the example below, the policy graph will have two policies, 'adversary' and 'av' (non-adversary)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "def gen_policy():\n",
    "    \"\"\"Generate a policy in RLlib.\"\"\"\n",
    "    return PPOTFPolicy, obs_space, act_space, {}\n",
    "\n",
    "\n",
    "# Setup PG with an ensemble of `num_policies` different policy graphs\n",
    "POLICY_GRAPHS = {'av': gen_policy(), 'adversary': gen_policy()}\n",
    "\n",
    "\n",
    "def policy_mapping_fn(agent_id):\n",
    "    \"\"\"Map a policy in RLlib.\"\"\"\n",
    "    return agent_id"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lastly, just like any other experiments, we run our code using `train_rllib.py` as follows:\n",
    "\n",
    "    python flow/examples/train_rllib.py new_multiagent_experiment.py"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
